Application of Bloom filters in Fast Packet Classification
Milind K. Chavan1*, B. S.
Agarkar2
1P.G. Student,
S.R.E.S College of Engineering, Kopargaon, University
of Pune, Pune
2Professor,
Dept. E&Tc Engg.
S.R.E.S’ College of Engineering, Kopargaon,
University of Pune, Pune
*Corresponding Author: milindkc@gmail.com,
bagarkar@yahoo.co.in
ABSTRACT:
Packet classification
finds various applications in computer networks like QoS,
Firewalls, multimedia communication, telecommunication, security, monitoring
data traffic etc. To classify packets to a particular flow or the set of flows,
Intermediate nodes which are present in the network must perform search for a
rule which defines the flow of that particular packet which is chosen on the
basis of different field present in the data packet. The rule set is predefined
by the user, which is constructed on the basis of algorithmic and architectural
methodology. The major constraint in this methodology is searching speed of
particular rule. Since few decades researches are finding the best
computational methodology for packet classification. But the current algorithms
which are used in the packet classification highly rely on expensive and high
power consuming devices like TCAM (Ternary content addressable memory).
Therefore searching of fast and power efficient algorithms for packet
classification is still the subject of interest for researchers.
In this paper we have
delivered a new direction to packet classification which includes algorithmic
and architectural structure for packet classification. Our inception is from
well-known Cross product algorithm which is very fast but introduce additional
rules which increases memory requirement. We have shown how to enhance the crossproduct in a way which drastically reduces this
addition of extra rules, without affecting the throughput of the algorithm,
unnecessary memory access to the off chip memory are avoided by filtering them
through on chip bloom filter.
For packets that
matches p rules in a rule set, our
algorithm requires only P+4+ε memory access to find all the matching
rules. Where ε <<1 which is a constant that depends on small false
positive rate of Bloom filter. Using two SRAM chips search speed of 38 million
packets per second can be achieved. For the rule set size of few hundred to
thousand rules an average rules set expansion factor is 1.2 to 1.4 and average
memory consumption is 32 to 45 bytes.
KEY WORDS: Packet classification, TCAM, crossproduct, bloom filter, SRAM.
INTRODUCTION:
Packet classification
is becoming favorite topic of researchers in last few decades as the demand of
large data in communication is increasing day by day, thus we require more and
more sophisticated algorithmic techniques to fulfill this demand. Logically
packet classification technique is nothing but comparing the bit stream given
in the different fields in data packets with the classifier which consist of a
rule set. The comparison is done with the prefix bits and not with the whole
bit stream. After the matched rule is found the respective action is applied on
the packet which is defined in the classifier.
However till this date
none of the computational techniques is not able to eliminate TCAM in real life
application. TCAMs are the storing devices that stores the array of limited
width key. This key is used to search the rule in parallel and produces the
result when any entry matches with the key. Recent TCAMs supports up to 133
million searches per second for 144 bit wide key and can be able to store 128K
keys that are 144 bit wide [6] TCAM devices are costly and consumes 50 times
much power than other devices but they are still favorite choice of
manufacturer. They are also 15 times more bulky than SRAM.
In this paper we are
implementing a new logarithmic method i.e. Crossproduct
algorithm. In this algorithm a single data structure is created for a single
and number of data structure are constructed as per the no of field to be
checked according to the classifier. But the drawback is it create more amount
of additional rules. Which require significantly large amount of memory. So to
avoid this unnecessary usage of memory by additional rules, multiple subsets of
data structure (trie) are created, which drastically
reduces the unnecessary usage of memory. In cross producting
multiple subset algorithm the results are used to form a key, which is used to
lookup in lookup table to find matched rule.
To achieve this we
will first look up the prefix for each field separately using bloom filter
which is fast and memory efficient searching technique. Therefore, with a very
high probability the longest prefix matching can be performed on the source and
destination addresses and the source and destination port in just four memory
accesses.
To reduce the memory
consumption we have divided the rules into multiple subsets and then
constructed cross product lookup table. As the rules are distributed into no. of
sets, we need to perform lookup in each subset for which we can use Bloom
filter. This computational technique will avoid the unnecessary lookups in
those subset which do not match the prefix address.
To reduce memory
requirement, we divide the rule in multiple subset and then construct a cross
product table for each subset. This will reduce the requirement of additional
rules in cross product. As we divided the rules in no of subsets and then
constructed a cross product table. Thus, we require lookup in each subset.
However we can use bloom filter to avoid this extra lookup in the subset which
are not having matching rule, which helps to get high through put from this
algorithm. If multiple rules are matched then we will require only 4 access to
choose the highest priority rule where P is the number of rules that packet can
match.
Packet
classification is the very important element in computer network. As the demand
of digital data is increasing day by day, the speed of transmission and
reception of digital data is also increasing. In order to process this huge
amount of digital data a sophisticated technique should be used. An excellent
survey and classification is given on exiting packet classification algorithm
in [13].Here we will discuss only the algorithm which are close to our
algorithm.
The basic idea for
packet classification algorithm is perfectly explain in [6] which enlight the knowledge of packet classification with a very
good and simple example of real life classifier. In this paper Pankaj Guptan and Mc Keown has
given an example network connected two enterprise network and two internet
service provider across a network access point. By using this examples they
have explained application of packet classification. They have also classified
the basic techniques in 4 types
1) Basic data
structure.
2) Geometric
algorithms.
3) Heristics
based.
4) Hardware based.
Which provide exact
look up performance is related to basic cross product algorithm [11].
The basic concept of
cross product algorithm is that, we have to perform LPM on each field first and
then combine the results of individual to form a key which mapped towards crossproduct table. This best matching rule from the cross
product table can be fetched in only one memory access (cycle). The single
field look up as in the RFC algorithm [6]. The BV [7] and ABV [3] algorithm
uses bit vector intersection to replace the cross product look up TCAM are
largely used in packet classification.
TCAMs are widely used for
packet classification. Latest TCAM devices also include the banking mechanism
to reduce the power consumption by selectively turning off the unused banks.
Traditionally, TCAM devices needed to expand the range values into prefixes for
storing a rule with range specifications. The recently introduced algorithm,
DIRPE [7], uses a clever technique to encode ranges differently which results
in overall lesser rule expansion compared to the traditional method. The
authors also recognized that in modern security applications, it is not
sufficient to stop the matching process after the first match is found but all
the matching rules for a packet must be reported. They devised a multi-match
scheme with TCAMs which involves multiple TCAM accesses.
In paper [6] the
authors have done longest prefix matching using bloom filter to get the
required rule for the packet, in this paper
bloom filters are created on the bases of count of bit present in the
destination IP address. The bloom filter is made like: for 1 bit prefix. 1bit
bloom filter is programmed for 2bit prefix two bit length bloom filter is
programmed and so on. After matching the 1,2,3,…..n
numbers of prefixes one by one the proportional data is collected, if 1bit
prefix is matched with 1bit bloom filter then filter will generate 1bit at its
output, if it do matched then it will generate zero at the output thus 1,2,3………n no. of prefixes are matched
with IP address. Even a single bit in output string of bloom filter is 0 then
it will discard the packet. If matched then, computation is performed on the
output string which is called hashing. After hashing the identity of particular
rule is return.
Deficient cross
product algorithm work as follows. First, the separate trie
is constructed on the bases of fields which are represented in the rule set. In
this trie each node is marked with the prefix involve
in the rule. Let the first trie for the field, search
and second trie for field 2 search. The connection
between the marked nodes is nothing but the rule. At start we perform
independent search for each field in the respective individual trie and find the most specific prefix .i.e. the longest
matching prefix (LPM). After this we create a unique key and use it to index
the cross product rule table. Every rule in cross product table is original
rule on an artificial rule which we generated during crossproducting,
as the cross producting is multiplicative in nature.
This rules either forms matching rule or do not form any rule. Hence on
matching we either get nothing i.e. no match or match. Thus when there is match
present, we always gets the correct rule. This is shown in Figure 1.
Table 1 Basic Classifier Table
|
r1 |
1* |
* |
|
r2 |
1* |
00* |
|
r3 |
01* |
100* |
|
r4 |
101* |
11* |
|
r5 |
101* |
11* |
|
r6 |
00* |
0 |
Figure 1 Illustration of basic cross product algorithm
Here we shown two
dimensional rule set where each field is 4 bit wide for the purpose of
demonstration.
TABLE 2 Representation of cross product table
|
|
1* |
* |
r1 |
|
|
1* |
00* |
r2 |
|
p1 |
1* |
11* |
r1 |
|
p1 |
1* |
100* |
r1 |
|
|
00* |
* |
r6 |
|
p3 |
00* |
00* |
r6 |
|
p4 |
00* |
11* |
r6 |
|
p5 |
00* |
100* |
r6 |
|
|
01* |
* |
|
|
|
01* |
00* |
|
|
|
01* |
11* |
|
|
|
01* |
100* |
r3 |
|
p6 |
101* |
* |
r1
|
|
p7 |
101* |
00* |
r1
r2 |
|
|
101* |
11* |
r5 |
|
|
101* |
100* |
r4 |
These crossproduct algorithm has two deficiencies.
1) A large no of empty
rule.
2) A very large no of
pseudo rule.
The first problem is
eliminated by using hash table. Instead of using direct look up table,as the cross product table maintain all the
possibility that are generated by cross producting,
so that it can index directly to the rule set. Thus maintaining a hash table is
the best way. There are empty rule also, which can be compressed by this
method.
Above all this if we use Bloom filter described in above topic before
Hash table it drastically improves the throughput. Because it require only one
memory access per LPM.
Therefore entire classification process takes 5 memory access with very
high probability to classify a packet.
Figure 3 Representaion of Pseudo Rule and
Original Rule
In the deficient crossproduct algorithm to get
list of matching rule we required only one hash table access. However if we
allow our self to use multiple hash table accesses then we can split the single
subset into multiple smaller rule set and taking cross produdcting
between them. By this method the pseudo rule get reduce significantly as
compared to deficient cross producting algorithm.
This is shown in figure 4.
Figure 4 dividing rule
in separate subset Fig. (a)(b)&(c) to reduce overlap and LPM Tables 3(b)
& Tables 3(b) for individual field.
Table 3(a)
|
|
G1 |
G2 |
G3 |
|
1* |
1 |
1 |
- |
|
00* |
- |
- |
2 |
|
01* |
- |
2 |
- |
|
101* |
3 |
1 |
- |
Table3(b)
|
|
G1 |
G2 |
G3 |
|
* |
- |
0 |
0 |
|
00* |
2 |
0 |
0 |
|
11* |
2 |
0 |
0 |
|
100* |
3 |
3 |
0 |
We divided the rule
set into three subset and within those subset we has taken the crossproducting and inserted those rule which are provided
in ruleset this results inserting P7 pseudo rule in
subset 1 (G1) andP2 in subset 2(G2) all the pseudo rule vanishes and the load
of the extra rule reduces significantlyl. Now one
question arises in our mind how this. Pseudo rule vanishes? This is because the
cross product is by default multiplicative in nature. When the no. of
overlapping prefixes of a field i get reduced by a factor of xi due to partisioning the
resulting reduction in the cross product rule is of the order πxi and here large. After the reduction of the required memory by the cross
productivity, an independent hash table can be prepaired
in which for each independent rule in subset, independent look can be
performed. This splitting insert two extra memory access.
1) An entire LPM process
is perform for all subset.
2) A separate access is required to look up final from
hash table.
After splitting the
rule set into multiple subset LPM is done on an each subset, this will generate
the key which is place in the LPM table for particular field. So, LPM is done
on each subset and a key is obtained. Key is nothing but a no. of prefixes
matched in the subset if no. prefix is matched in subset, then it will simply
take the key as zero. So in practice we can directly skip this subset and move
to next subset. But for the purpose of analysis we will consider all the
matching prefix key as non-zero. After probing this rule directly in hash table
we get multiple unnecessary rules. This problem is avoided by using bloom
filter. We maintain one bloom filter
in chip memory corresponding to each off chip rule subset hash table. We first
tests the bloom filters with the key to be looked up in the subset. If the
filter shows the match subset, then It will simply take the key as zero. So in
practice we can directly skip this subset and move to next subset. But for the
purpose of analysis we will consider all the matching prefix key as non-zero.
After probing this rule directly in hash table we get multiple unnecessary
rules. This problem is avoided by using bloom filter. We maintain one bloom filter in chip memory corresponding to each
off chip rule subset hash table. We first tests the bloom filters with the key
to be looked up in the subset. If the filter shows the match. We took the
longest matching prefix (up the key in the off chip hash table the flow of
algorithm is shown in figure 5.
Our algorithm will classifies a packet in
only 4+p+€, where p is the no of
rules a given packet can match, € is
the small constant that gives false positive match of bloom filter. Our
algorithm is also much power efficient. It consumes an average 32 to 45 bytes
per rule of memory. Hence rule set are large as 128k which can easily supported in less than
5 MB of SRAM. Using two 300 MHz, 36 bit wide SRAM chips packet can be
classified at the rate of 38 million packets per second.
Figure 5 Illustration
of flow of algorithm
Algorithmic
solution are always a better alternative for TCAM for lower cost, less power
consumption and flexibility. Our computational methodology includes multi set crossproducting, which are much better than only crossproducting with insertion of bloom filter which
accelerates computational process of packet cassification.
Our algorithm will classify packets in only 4+p+ ε, where p is the no of rules a
given packet can match, ε is
the small constant that gives false positive match of bloom filter. Our algorithm
is also much power efficient. It consumes an average 32 to 45 bytes per rule of
memory. Hence rule set are large as 128k
can easily supported in
less than 5 MB of SRAM. Using two 300 MHz, 36 bit wide SRAM chips packet can be
classified at the rate of 38 million packets per second.
[1]
IDT
Generic Part: 71P72604. http://www.idt.com/?catID=58745 &genID=71P72604.
[2]
IDT Generic Part: 75K72100. http://www.idt.com/?catID=58523
&genID=75K72100.
[3]
Florin
Baboescu and George Varghese. Scalable Packet
Classification. In ACM SIGCOMM, 2001.
[4]
Sarang Dharmapurikar, P. Krishnamurthy, and Dave
Taylor. Longest Prefix Matching using Bloom Filters. In ACM SIGCOMM,
August 2003.
[5]
Will
Eatherton. Fast IP Lookup Using Tree Bitmap. Washington University Master Thesis,
1999.
[6]
Pankaj Gupta and Nick McKeown. Packet classification
on multiple fields. In ACM SIGCOMM, 1999.
[7]
T.
V. Lakshman and D. Stiliadis.
High-speed policy-based packet forwarding using efficient multi-dimensional
range matching.In ACM SIGCOMM, 1998.
[8]
K. Lakshminarayanan, Anand Rangarajan, and Srinivasan Venkatachary. Algorithms for Advanced Packet Classification
using Ternary CAM. In ACM SIGCOMM, 2005.
[9]
Haoyu Song, Sarang Dharmarpurikar,
Jonathan Turner, and John Lockwood. Fast Hash Table Lookup Using Extended
Bloom.
[10]
Filter:
An Aid to Network Processing. In ACM SIGCOMM, 2005.
[11]
V. Srinivasan, Subhash Suri, and George Varghese. Packet Classification Using Tuple Space Search. In ACM SIGCOMM, 1999.V. Srinivasan, George Varghese, Subhash
Suri, and Marcel Waldvogel.
Fast and Scalable Layer Four Switching. In ACM SIGCOMM, 1998.
[12]
David
Taylor and Jon Turner. Scalable Packet Classification Using Distributed Crossproducting of Field Labels. In IEEE INFOCOM,
July 2005.
[13]
David
E. Taylor. Survey and taxonomy of packet classification techniques. Washington
University Technical Report, WUCSE-2004, 2004.
[14]
David
E. Taylor and Jonathan S. Turner. Classbench: A
Packet Classification Benchmark. In IEEE INFOCOM, 2005.
[15]
Fang
Yu and Randy H. Katz. Efficient Multi-Match Packet Classification with TCAM. In
IEEE Hot Interconnects, August 2003.
[16]
Balasaheb S. Agarkar, Uday V. Kulkarni “A Novel Technique for Fast Packet Classification”.
International Journal of Computer Applications (0975 – 8887), Volume 76– No.4,
August 2013.
[17]
Fang
Yu, T. V. Lakshman, Martin Austin Motoyama,
and Randy H. Katz. Ssa: a power and memory efficient scheme to multi-match
packet classification. In ANCS ’05:
Proceedings of the 2005 symposium on Architecture for networking and
communications systems, 2005
Received on 27.05.2014 Accepted on 14.06.2014
©A&V Publications all right reserved
Research J. Engineering and Tech.
5(2): April- June 2014 page 77-82